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Abstract 

In  this  paper,  we  address  the  problem  of  case-based 
learning  in  the  presence  of  irrelevant  features.  We  re¬ 
view  previous  work  on  attribute  selection  and  present 
a  new  algorithm,  Oblivion,  that  carries  out  greedy 
pruning  of  oblivious  decision  trees,  which  effectively 
store  a  set  of  abstract  cases  in  memory.  We  hypothe¬ 
size  that  this  approach  will  efficiently  identify  relevant 
features  even  when  they  interact,  as  in  parity  concepts. 
We  report  experimental  results  on  artificial  domains 
that  support  this  hypothesis,  and  experiments  with 
natural  domains  that  show  improvement  in  some  cases 
but  not  others.  In  closing,  we  discuss  the  implications 
of  our  experiments,  consider  additional  work  on  irrel¬ 
evant  features,  and  outline  some  directions  for  future 
research. 

1.  Introduction 

Effective  case-based  reasoning  relies  on  the  identifica¬ 
tion  of  a  subset  of  features  that  are  relevant  to  the 
learning  task.  Most  work  on  this  topic  assumes  the 
developer  makes  this  decision,  but  application  of  case- 
based  methods  to  complex  new  domains  would  be  aided 
by  automated  methods  for  feature  selection.  Some 
researchers  (e.g.,  Barletta  &  Mark,  1988;  Cain,  Paz- 
zani,  &  Silverstein,  1991)  have  explored  the  use  of 
domain-specific  background  knowledge  to  select  useful 
features,  but  this  approach  will  not  work  when  little 
domain  knowledge  is  available.  Domain-independent 
methods  for  feature  selection  would  augment  the  tech¬ 
niques  available  for  developing  case-based  systems. 

Rather  than  selecting  features,  one  might  employ 
all  available  features  during  case  retrieval,  giving  them 
equal  weight  in  this  process.  Cover  and  Hart  (1967) 
have  proven  that  a  simple  nearest  neighbor  algorithm, 
probably  the  simplest  case-based  method,  has  excellent 
asymptotic  accuracy.  However,  more  recent  theoreti¬ 
cal  analyses  (Langley  &  Iba,  1993)  and  experimental 
studies  (Aha,  1990)  suggest  that  the  empirical  sample 
complexity  of  nearest  neighbor  methods  is  exponential 
in  the  number  of  irrelevant  features.  This  means  that 
the  presence  of  irrelevant  attributes  can  slow  the  rate 
of  case-based  learning  drastically. 


A  natural  response  is  to  draw  on  machine  learn¬ 
ing  techniques  to  identify  those  attributes  relevant  to 
the  task  at  hand.  For  example.  Cardie  (1993)  used  a 
decision-tree  method  (C4.5)  to  select  features  for  use 
during  case  retrieval.  She  passed  on  to  a  fc  nearest 
neighbor  algorithm  only  the  features  occurring  in  the 
induced  decision  tree.  She  reported  good  results  in 
a  natural  language  domain,  with  k  nearest  neighbor 
in  the  reduced  space  outperforming  both  C4.5  and  k 
nearest  neighbor  using  all  the  features. 

Unfortunately,  although  the  greedy  approach  of  C4.5 
works  well  for  conjunctive  and  m  of  n  concepts,  it  suf¬ 
fers  when  attribute  interactions  exist.  In  this  case,  a 
relevant  feature  in  isolation  may  appear  no  more  dis¬ 
criminating  than  an  irrelevant  one.  Parity  concepts 
constitute  the  most  extreme  example  of  this  situation. 
Experimental  studies  (Almuallim  &  Dietterich,  1991; 
Kira  &  Rendell,  1992)  confirm  that,  for  some  target 
concepts,  decision-tree  methods  deal  poorly  with  irrel¬ 
evant  features. 

Almuallim  and  Dietterich’s  Focus  (1990)  tried  to 
address  this  difficulty  by  searching  for  combinations  of 
features  that  discriminate  the  classes.  The  accuracy 
of  this  method  is  almost  unaffected  by  the  introduc¬ 
tion  of  irrelevant  attributes,  but  its  time  complexity  is 
quasi-polynomial  in  the  number  of  attributes.  Schlim- 
mer  (1993)  presented  a  related  technique  that  uses 
knowledge  about  the  partial  ordering  of  the  space  to 
reduce  the  search,  but  still  had  to  limit  the  complexity 
of  learnable  target  concepts  to  keep  the  search  within 
bounds.  Thus,  there  remains  a  need  for  more  practi¬ 
cal  algorithms  that  can  handle  domains  with  complex 
feature  interactions  and  irrelevant  attributes. 

In  the  following  pages,  we  present  a  new  algorithm 
-  Oblivion  -  that  should  handle  irrelevant  features 
in  a  more  efficient  manner  than  Almuallim  and  Diet¬ 
terich’s  or  Schlimmer’s  techniques,  and  we  show  how 
the  method  can  be  viewed  as  identifying  and  stor¬ 
ing  abstract  cases.  We  report  experimental  studies  of 
Oblivion’s  behavior  on  both  artificial  and  natural  do¬ 
mains,  and  we  draw  some  tentative  conclusions  about 
the  approach  to  feature  selection  it  embodies.  Finally, 
we  consider  some  additional  related  work  and  suggest 
directions  for  future  research  on  this  topic. 
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2.  Induction  of  Oblivious  Decision  Trees 

Our  research  goal  was  to  develop  an  algorithm  that 
handled  both  irrelevant  features  and  attribute  inter¬ 
actions  without  resorting  to  expensive,  enumerative 
search.  Our  response  draws  upon  the  realization  that 
both  Almuallim  and  Dietterich’s  and  Schlimmer’s  ap¬ 
proaches  construct  oblivious  decision  trees,  in  which 
all  nodes  at  the  same  level  test  the  same  attribute.  Al¬ 
though  these  methods  use  forward  selection  (i.e.,  top- 
down  search)  to  construct  oblivious  decision  trees,  one 
can  also  start  with  a  full  oblivious  decision  tree  that 
includes  all  the  attributes,  and  then  use  pruning  or 
backward  elimination  to  remove  features  that  do  not 
aid  classification  accuracy.  The  advantage  of  the  lat¬ 
ter  approach  is  that  accuracy  decreases  substantially 
when  one  removes  a  single  relevant  attribute,  even  if  it 
interacts  with  other  features,  but  remains  unaffected 
when  one  prunes  an  irrelevant  or  redundant  feature. 

Oblivion  is  an  algorithm  that  instantiates  this  idea. 
The  method  begins  with  a  full  oblivious  tree  that  in¬ 
corporates  all  potentially  relevant  attributes  and  esti¬ 
mates  this  tree’s  accuracy  on  the  entire  training  set, 
using  a  conservative  technique  like  n-way  cross  valida¬ 
tion.  Oblivion  then  removes  each  attribute  in  turn, 
estimates  the  accuracy  of  the  resulting  tree  in  each 
case,  and  selects  the  most  accurate.  If  this  tree  makes 
no  more  errors  than  the  initial  one.  Oblivion  replaces 
the  initial  tree  with  it  and  continues  the  process.  On 
each  step,  the  algorithm  tentatively  prunes  each  of  the 
remaining  features,  selects  the  best,  and  generates  a 
new  tree  with  one  fewer  attribute.  This  continues  un¬ 
til  the  accuracy  of  the  best  pruned  tree  is  less  than 
the  accuracy  of  the  current  one.  Unlike  FocUS  and 
Schlimmer’s  method.  Oblivion’s  time  complexity  is 
polynomial  in  the  number  of  features,  growing  with 
the  square  of  this  factor. 

There  remain  a  few  problematic  details,  such  as  con¬ 
structing  an  initial  tree  that  is  exponential  in  the  num¬ 
ber  of  initial  attributes,  determining  the  order  of  the 
retained  attributes,  and  passing  the  results  to  some 
learning  method.  However,  none  of  these  steps  is  actu¬ 
ally  necessary.  The  key  lies  in  realizing  that  an  obliv¬ 
ious  decision  tree  is  equivalent  to  a  nearest  neighbor 
scheme  that  ignores  some  features.  In  this  view,  each 
path  through  the  tree  corresponds  to  an  abstract  case 
that  summarizes  an  entire  set  of  training  instances. 
Because  pruning  can  produce  impure  partitions  of  the 
training  set,  each  such  case  specifies  a  distribution  of 
class  values.  When  an  instance  matches  a  case’s  condi¬ 
tions,  it  simply  predicts  the  most  likely  class.  If  train¬ 
ing  data  are  sparse  and  a  test  instance  fails  to  match 
any  stored  abstract  case,  one  finds  the  nearest  cases 
(i.e.,  with  the  most  matched  conditions),  sums  the  class 
distributions  for  each  one,  and  predicts  the  most  likely 
class.  This  insight  into  the  relation  between  oblivious 
decision  trees  and  nearest  neighbor  algorithms  was  an 
unexpected  benefit  of  our  work. 


3.  Experimental  Studies  of  OBLIVION 

We  expected  Oblivion  to  scale  well  to  domains  that 
involve  many  irrelevant  features.  To  test  this  predic¬ 
tion,  we  designed  an  experimental  study  with  four  ar¬ 
tificial  Boolean  domains  that  varied  both  the  degree  of 
feature  interaction  and  the  number  of  irrelevant  fea¬ 
tures.  We  examined  two  target  concepts  -  five-bit  par¬ 
ity  and  a  five-feature  conjunction  -  in  the  presence  of 
both  zero  and  three  irrelevant  attributes.  For  each  con¬ 
dition,  we  randomly  generated  20  sets  of  200  training 
cases  and  100  test  cases,  and  measured  classification 
accuracy  on  the  latter.  In  addition  to  varying  the  two 
domain  characteristics,  we  also  examined  three  induc¬ 
tion  algorithms  -  simple  nearest  neighbor  (which  does 
not  carry  out  attribute  selection),  C4.5  (which  employs 
a  forward  greedy  selection),  and  Oblivion  (i.e.,  near¬ 
est  neighbor  with  backward  greedy  selection).  Finally, 
we  varied  the  number  of  training  instances  available 
before  testing,  to  obtain  learning  curves. 

We  had  a  number  of  hypotheses  about  the  outcomes 
of  this  study.  First,  we  expected  C4.5  to  be  unaffected 
by  irrelevant  attributes  in  the  conjunctive  domain,  but 
to  suffer  on  the  parity  concept,  because  none  of  the 
five  relevant  features  would  appear  diagnostic  in  isola¬ 
tion.  In  contrast,  we  predicted  that  nearest  neighbor 
would  suffer  equally  on  both  target  concepts,  but  that 
Oblivion’s  ability  to  remove  irrelevant  features  even 
in  the  presence  of  feature  interaction  would  let  it  scale 
well  on  both  concepts.  Finally,  we  hypothesized  that 
Oblivion’s  learning  curve  would  closely  follow  that 
for  nearest  neighbor  when  no  irrelevants  were  present, 
but  that  it  would  mimic  C4.5  in  the  absence  of  feature 
interactions. 

Figure  1  (a)  shows  the  learning  curves  on  the  parity 
target  concept  when  only  the  five  relevant  attributes 
and  no  irrelevant  ones  are  present  in  the  data.  In  this 
experimental  condition,  nearest  neighbor  and  Obliv¬ 
ion  increase  their  accuracy  at  the  same  rate,  but  sur¬ 
prisingly,  C4.5  actually  learns  somewhat  more  rapidly. 
The  situation  changes  drastically  in  Figure  1  (b),  which 
presents  the  results  when  there  are  three  irrelevant  fea¬ 
tures.  Here  the  learning  curves  for  both  nearest  neigh¬ 
bor  and  C4.5  have  flattened  considerably.  In  contrast, 
the  learning  rate  for  Oblivion  is  almost  unaffected  by 
their  introduction.  A  different  situation  holds  for  the 
conjunctive  target  concept  (not  shown).  In  this  case, 
all  three  algorithms  require  about  the  same  number  of 
instances  to  reach  perfect  accuracy  when  no  irrelevants 
are  present,  with  nearest  neighbor  taking  a  surprise 
lead  in  the  early  part  of  training.  The  introduction  of 
irrelevant  attributes  affects  nearest  neighbor  the  most, 
and  C4.5’s  learning  curve  is  somewhat  less  degraded 
than  that  for  Oblivion. 

These  results  support  our  hypothesis  about  Obliv¬ 
ion’s  ability  to  scale  well  to  domains  that  have  both 
irrelevant  features  and  interaction  among  relevant  at- 
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Figure  1.  Learning  curves  for  nearest  neighbor,  C4.5  without  pruning,  and  Oblivion  on  the  five-bit  parity  concept  given  (a) 
zero  irrelevant  attributes  and  (b)  three  irrelevant  attributes.  The  error  bars  indicate  95%  confidence  intervals. 


tributes.  However,  we  also  wanted  to  evaluate  the 
importance  of  this  finding  on  natural  data.  Holte’s 
(1993)  results  with  the  UCI  repository  suggest  that 
these  domains  contain  many  irrelevant  features  but 
few  interactions  among  relevant  ones;  in  this  case,  we 
would  expect  C4.5  and  Oblivion  to  outperform  near¬ 
est  neighbor  on  them.  But  it  is  equally  plausible  that 
these  domains  contain  many  relevant  but  redundant 
attributes,  in  which  case  we  would  observe  little  differ¬ 
ence  in  learning  rate  among  the  three  algorithms. 

In  four  of  the  UCI  domains  -  Congressional  vot¬ 
ing,  mushroom,  DNA  promoters,  and  breast  cancer  - 
we  found  little  difference  in  the  behavior  of  Obliv¬ 
ion,  C4.5,  and  nearest-neighbor.  All  three  algorithms 
learn  rapidly  and  the  learning  curves  (not  shown)  are 
very  similar.  Inspection  of  the  decision  trees  learned 
by  C4.5  and  Oblivion  in  two  of  these  domains  re¬ 
vealed  only  a  few  attributes.  Combined  with  the  fact 
that  nearest  neighbor  performs  at  the  same  level  as  the 
other  methods,  this  is  consistent  with  the  latter  expla¬ 
nation  for  Holte’s  results,  that  these  domains  contain 
largely  redundant  features.^ 

One  domain  in  which  Holte  found  major  differences 
was  king-rook  vs.  king-pawn  chess  endgames,  a  two- 
class  data  set  that  includes  36  nominal  attributes.  This 
suggested  that  it  might  contain  significant  attribute  in¬ 
teractions,  and  thus  might  give  different  outcomes  for 
the  three  algorithms.  Figure  2  (a)  gives  the  result¬ 
ing  learning  curves,  averaged  over  20  runs,  in  which 
Oblivion’s  accuracy  on  the  test  set  is  consistently 
about  ten  percent  higher  than  that  for  nearest  neigh¬ 
bor,  though  presumably  the  latter  would  eventually 

1.  A  forward-selection  variant  of  Oblivion  (basically  a 
greedy  version  of  the  Focus  algorithm)  also  produced 
very  similar  curves  on  these  domains,  providing  further 
evidence  that  they  do  not  involve  both  feature  interac¬ 
tions  and  irrelevant  attributes. 


catch  up  if  given  enough  instances.  However,  C4.5 
reaches  a  high  level  of  accuracy  even  more  rapidly  than 
Oblivion,  suggesting  that  this  domain  contains  many 
irrelevant  attributes,  but  that  there  is  little  interaction 
among  the  relevant  ones.  Inspection  of  the  decision 
trees  that  C4.5  generates  after  500  instances  is  consis¬ 
tent  with  this  account,  as  they  contain  about  ten  of 
the  35  attributes,  but  only  a  few  more  terminal  nodes 
than  levels  in  the  tree,  making  them  nearly  linear  and 
thus  in  the  same  difficulty  class  as  conjunctions. 

Figure  2  (b)  shows  encouraging  results  on  another 
domain,  this  time  averaged  over  ten  runs,  that  involves 
prediction  of  a  word’s  specific  semantic  class  from  the 
surrounding  context  in  the  sentence.  These  data  in¬ 
clude  35  nominal  attributes  (some  with  many  possi¬ 
ble  values)  and  some  40  word  classes.  Nearest  neigh¬ 
bor  does  very  poorly  on  this  domain,  suggesting  that 
many  of  the  attributes  are  irrelevant.  Inspection  of 
C4.5’s  and  Oblivion’s  output,  which  typically  retain 
about  half  of  the  attributes,  is  consistent  with  this  ex¬ 
planation.  In  the  latter  part  of  the  learning  curves. 
Oblivion’s  accuracy  pulls  slightly  ahead  of  that  for 
C4.5,  but  not  enough  to  suggest  significant  interaction 
among  the  relevant  attributes.  Indeed,  Cardie  (1993) 
reports  that  (on  a  larger  training  set)  nearest  neighbor 
outperforms  C4.5  on  this  task  when  the  former  uses 
only  those  features  found  in  the  latter’s  decision  tree. 
This  effect  cannot  be  due  to  feature  interaction,  since 
it  relies  on  C4.5’s  greedy  forward  search  to  identify  fea¬ 
tures;  instead,  it  may  come  from  the  different  represen¬ 
tational  biases  of  decision  trees  and  case-based  meth¬ 
ods,  which  would  affect  behavior  on  test  cases  with 
imperfect  matches. 

The  above  findings  indicate  that  many  of  the  avail¬ 
able  data  sets  contain  few  truly  irrelevant  features,  and 
none  of  these  appear  to  involve  complex  feature  inter¬ 
actions.  These  observations  may  reflect  preprocessing 


Decision  Trees  and  Abstract  Cases 


4 


(a) 


(b) 


I 


e 

0. 


Figure  2.  Predictive  accuracy  as  a  function  of  training  instances  for  nearest  neighbor,  C4.5  with  pruning,  and  Oblivion  on 
(a)  classifying  chess  endgames  and  (b)  predicting  a  word’s  semantic  class. 


of  many  of  the  UCI  databases  by  domain  experts  to 
remove  irrelevant  attributes  and  to  replace  interacting 
features  with  better  terms.  The  voting  records,  which 
contain  only  16  key  votes  as  identified  by  the  Congres¬ 
sional  Quarterly,  provide  an  extreme  example  of  the 
first  trend.  As  machine  learning  starts  to  encounter 
new  domains  in  which  few  experts  exist,  such  data  sets 
may  prove  less  representative  than  artificial  ones. 

The  experiments  with  artificial  domains,  reported 
earlier,  revealed  clear  differences  in  the  effect  of  irrele¬ 
vant  attributes  and  feature  interactions  on  the  behav¬ 
ior  of  nearest  neighbor,  C4.5,  and  Oblivion.  The  rate 
of  learning  for  the  nearest  neighbor  method  decreased 
greatly  with  the  addition  of  irrelevant  features,  regard¬ 
less  of  the  target  concept.  In  contrast,  irrelevant  at¬ 
tributes  hurt  C4.5  for  the  five-bit  parity  concept  but 
not  the  five-feature  conjunction;  top-down  greedy  in¬ 
duction  of  decision  trees  scales  well  only  when  the  rel¬ 
evant  features  (individually)  discriminate  among  the 
classes.  In  contrast,  the  learning  rate  for  Oblivion 
was  largely  unaffected  by  irrelevant  features  for  ei¬ 
ther  the  conjunctive  or  parity  concepts,  presumably 
because  its  greedy  pruning  method  was  not  misled  by 
interactions  among  the  relevant  features. 

4.  Discussion 

We  have  already  reviewed  the  previous  research  that 
led  to  our  work  on  Oblivion,  and  we  have  drawn  some 
tentative  conclusions  about  the  algorithm’s  behavior 
from  our  experimental  results.  Here  we  consider  some 
additional  related  work  on  induction,  along  with  direc¬ 
tions  for  future  research. 

Kira  and  Rendell  (1992)  have  followed  a  somewhat 
different  approach  to  feature  selection.  For  each  at¬ 
tribute  A,  their  Relief  algorithm  assigns  a  weight  Wa 


that  reflects  the  relative  effectiveness  of  that  attribute 
in  distinguishing  the  classes.  The  system  then  selects 
as  relevant  only  those  attributes  with  weights  that  ex¬ 
ceed  a  user-specified  threshold,  and  passes  these  fea¬ 
tures,  along  with  the  training  data,  to  another  induc¬ 
tion  algorithm  such  as  IDS.  Comparative  studies  on 
two  artificial  domains  with  feature  interactions  showed 
that,  like  Focus,  the  Relief  algorithm  was  unaffected 
by  the  addition  of  irrelevant  features  on  noise-free  data, 
and  that  it  was  less  affected  than  FOCUS  (and  much 
more  efficient)  on  noisy  data. 

The  above  algorithms  filter  attributes  before  passing 
them  to  IDS,  but  John,  Kohavi,  and  Pfleger  (in  press) 
have  explored  a  wrapper  model  that  embeds  a  decision- 
tree  algorithm  within  the  feature  selection  process,  and 
Caruana  and  Freitag  (in  press)  have  described  a  similar 
scheme.  Each  examined  greedy  search  through  the  at¬ 
tribute  space  in  both  the  forward  and  backward  direc¬ 
tions,  including  variants  that  supported  bidirectional 
search.  John  et  al.  found  that  backward  elimina¬ 
tion  produced  more  accurate  trees  than  C4.5  in  two 
domains  but  no  differences  in  others,  whereas  Caru¬ 
ana  and  Freitag  reported  that  all  of  their  attribute- 
selection  methods  produced  improvements  over  (un¬ 
pruned)  ID3  in  a  single  domain. 

One  can  also  combine  the  wrapper  idea  with  nearest- 
neighbor  methods,  as  in  Oblivion.  Skalak  (in  press) 
has  recently  examined  a  similar  approach,  using  both 
Monte  Carlo  sampling  and  random  mutation  hill  climb¬ 
ing  to  select  cases  for  storage,  with  accuracy  on  the 
training  set  as  his  evaluation  measure.  Both  approaches 
led  to  reductions  in  storage  costs  on  four  domains  and 
some  increases  in  accuracy,  and  the  use  of  hill  climbing 
to  select  features  gave  further  improvements.  Moore, 
Hill,  and  Johnson  (in  press)  have  also  embedded  near¬ 
est  neighbor  methods  within  a  wrapper  scheme.  How¬ 
ever,  their  approach  to  induction  searches  not  only  the 
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space  of  features,  but  also  the  number  of  neighbors 
used  in  prediction  and  the  space  of  combination  func¬ 
tions.  Using  a  leave-one-out  scheme  to  estimate  ac¬ 
curacy  on  the  test  set,  they  have  achieved  significant 
results  on  two  control  problems  that  involve  the  pre¬ 
diction  of  numeric  values. 

Some  researchers  have  extended  the  nearest  neigh¬ 
bor  approach  to  include  weights  on  attributes  that 
modulate  their  effect  on  the  distance  metric.  For  ex¬ 
ample,  Cain  et  al.  (1991)  found  that  weights  derived 
from  a  domain  theory  increased  the  accuracy  of  their 
nearest-neighbor  algorithm.  Aha  (1990)  presented  an 
algorithm  that  learned  the  weights  on  attributes,  and 
showed  that  its  empirical  sample  complexity  grew  only 
linearly  with  the  number  of  irrelevant  features,  as  com¬ 
pared  to  exponential  growth  for  simple  nearest  neigh¬ 
bor.  In  principle,  proper  attribute  weights  should  pro¬ 
duce  more  accurate  classifiers  than  variants  that  sim¬ 
ply  omit  features.  However,  search  through  the  weight 
space  involves  more  degrees  of  freedom  than  Obliv¬ 
ion’s  search  through  the  attribute  space,  making  their 
relative  accuracy  an  open  question  for  future  work. 

Clearly,  our  experimental  results  are  somewhat  mixed 
and  call  out  for  additional  research.  Future  studies 
should  examine  other  natural  domains  to  determine 
if  feature  interactions  arise  in  practice.  Also,  since 
Oblivion  uses  the  leave-one-out  scheme  to  estimate 
accuracy,  we  predict  it  should  handle  noise  well,  but 
we  should  follow  Kira  and  Rendell’s  lead  in  testing 
this  hypothesis  experimentally.  Oblivion’s  simplicity 
also  suggests  that  an  average-case  analysis  would  prove 
tractable,  letting  us  compare  our  experimental  results 
to  theoretical  ones.  We  should  also  compare  Obliv¬ 
ion’s  behavior  to  other  methods  for  selecting  relevant 
features,  such  as  those  mentioned  above. 

Despite  the  work  that  remains,  we  believe  that  our 
analysis  has  revealed  an  interesting  relation  between 
oblivious  decision  trees  and  abstract  cases,  and  that 
our  experiments  provide  evidence  that  one  such  algo¬ 
rithm  outperforms  simpler  case-based  learning  meth¬ 
ods  in  domains  that  involve  irrelevant  attributes.  We 
anticipate  that  further  refinements  to  Oblivion  will 
produce  still  better  results,  and  that  additional  exper¬ 
iments  will  provide  a  deeper  understanding  of  the  con¬ 
ditions  under  which  such  an  approach  is  useful. 
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